2. Administering Content Sources
There are four cmdlets you can use to manage content sources.
Get-SPEnterpriseSearchCrawlContentSource
Set-SPEnterpriseSearchCrawlContentSource
New-SPEnterpriseSearchCrawlContentSource
Remove-SPEnterpriseSearchCrawlContentSource
Just as with any other Windows PowerShell cmdlet, you can use Get-Help to learn more about them. For example, type Get-Help net-spe*content*
to get more information about how to use
New-SPEnterpriseSearchCrawlContentSource. Using wildcards reduces the
amount of typing necessary.
To administer content
sources, you first need to know which content sources are available. In
Windows PowerShell, you can display a list of content sources and the
status of the crawling for a specific Search Service Application with
the following command (which is followed by an output example).
Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $searchapp
Name Id Type CrawlState CrawlCompleted
---- -- ---- ---------- --------------
Local SharePo... 2 SharePoint Idle 05/01/2010 23:58:20
FilesShares 3 File Idle
Old Intranet 5 Web CrawlStarting
where the variable $searchapp contains a reference to a Search Service Application. This variable can be initialized by typing a command similar to
$searchapp = Get-SPEnterpriseSearchServiceApplication |
where {$_.name -eq "Search Service Application"};
To list all Search Service Applications, use the following command.
Get-SPEnterpriseSearchServiceApplication
The properties and methods that allow you to manipulate the collection of content sources can be found by piping the output of the Get-SPEnterpriseSearchCrawlContentSource into the Get-Member cmdlet.
2.1. Scheduling Content Sources
The content source object
contains properties for both the full crawl and incremental crawl
schedules: FullCrawlSchedule and IncrementalCrawlSchedule. These
properties are references to a schedule object, which has its own
properties and methods. You can schedule both the full and the
incremental crawl for different types of periods. These are called schedule types,
and they dictate the type of schedule object created and referenced by
the FullCrawlSchedule and IncrementalCrawlSchedule properties. Following
are the available schedule object types.
Daily Use to specify the number of days between crawls.
Weekly Use to specify the number of weeks between crawls.
Monthly Use to specify the days of the month and months of the year when the crawl should occur.
MonthlyDayOf Week Use to specify the days of the month, the weeks of the month, and the months of the year when the crawl should occur.
When full and
incremental crawl schedules are defined, the schedule object type and
the schedule details can be displayed as shown in the following examples
and sample output. If a schedule is not defined, then the following
commands will result in an error message, “You cannot call a method on a
null-valued expression.”
$csource = Get-SPEnterpriseSearchCrawlContentSource "Local SharePoint sites"
-sea $searchapp;
$csource.FullCrawlSchedule.GetType();
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False MonthlyDateSchedule Microsoft.Office....
$csource.FullCrawlSchedule;
DaysOfMonth : Day2
MonthsOfYear : January
BeginDay : 30
BeginMonth : 1
BeginYear : 2010
StartHour : 11
StartMinute : 15
RepeatDuration : 0
RepeatInterval : 0
Description : At 11:15 on day 2 of Jan, starting 30/01/2010
NextRunTime : 02/01/2011 11:15:00
$csource.IncrementalCrawlSchedule.GetType();
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False DailySchedule Microsoft.Office....
$csource.IncrementalCrawlSchedule;
DaysInterval : 1
BeginDay : 29
BeginMonth : 11
BeginYear : 2009
StartHour : 0
StartMinute : 0
RepeatDuration : 0
RepeatInterval : 0
Description : At 00:00 every day, starting 29/11/2009
NextRunTime : 08/01/2010 00:00:00
By storing the schedule
object in a variable, you can change many of the properties, as you have
done for other object properties in previous examples; however, it is
easier to use the Set-SPEnterpriseSearchCrawlContentSource cmdlet. The parameter –ScheduleType
defines the type of crawl, which can be set as Full or Incremental.
There are then a set of parameters to configure the schedules,
summarized in Table 3.
Table 3. SharePoint 2010 Enterprise Search Full Crawl Schedule Parameters
PARAMETER NAME | NOTES |
---|
–DailyCrawlSchedule,
–WeeklyCrawlSchedule,
–MonthlyCrawlSchedule | Use
to set the type of schedule. No value is necessary for these
parameters; by specifying one of these parameters you set the schedule
type. |
–CrawlScheduleDaysOfMonth | Specifies the days on which to crawl when the –MonthlyCrawlSchedule parameter is used. |
–CrawlScheduleDaysOfWeek | Specifies the days on which to crawl when the –WeeklyCrawlSchedule parameter is used. |
–CrawlScheduleMonthsOfYear | Specifies the months on which to crawl when –MonthlyCrawlSchedule parameter is used. Valid values are 1 to 12. |
–CrawlScheduleRunEveryInterval | The
unit of the interval is dependent on the type of schedule. For example,
if schedule type is daily and the value of this parameter is 6, then
the schedule is run every 6 days. |
–CrawlScheduleRepeatDuration | Use to specify the number of times to repeat the crawl schedule. |
–CrawlScheduleRepeatInterval | Use to specify the number of minutes between each repeat interval. |
–CrawlScheduleStartDateTime | Use
to specify when the initial crawl should occur. This value cannot be
set using the SharePoint 2010 Central Administration website. The
default value is midnight on the current day |
To schedule a full crawl at
11:15 on day 2 of every month, starting 09/04/2010 for the content
source object reference to in the variable $cscource, use the following command.
Set-SPEnterpriseSearchCrawlContentSource $csource
-ScheduleType Full -MonthlyCrawlSchedule `
-CrawlScheduleDaysOfMonth 2 -CrawlScheduleStartDateTime "09/04/2010 11:15";
To schedule a full crawl at
01:45 every day starting 08/07/2010 for the Local SharePoint Sites
content source created in the Search Service Application referenced in
the variable $searchapp, use the following command.
Set-SPEnterpriseSearchCrawlContentSource "Local SharePoint Sites" -sea $searchapp `
-ScheduleType Full -DailyCrawlSchedule -CrawlScheduleStartDateTime
"08/07/2010 01:45";
2.2. Creating and Deleting Content Sources
Content sources contain
data that refers to shared network folders, SharePoint sites, other
websites, Exchange public folders, third-party applications, databases,
and so on. These different types of content sources have different
connection properties, and the SharePoint 2010 Central Administration
Web pages prompt you appropriately for those properties. Each content
source object represents one content source of particular content source
type. To use Windows PowerShell to create a new content source object,
you use just one cmdlet, New-SPEnterpriseSearchCrawlContentSource,
no matter the type of content source object you want to create. Unlike
the Central Administration Web pages, this cmdlet does not prompt for
the relevant data; therefore, before you can successfully create content
sources in Windows PowerShell, you need to be familiar with creating
them in the browser.
The New-SPEnterpriseSearchCrawlContentSource –Type parameter specifies the indexing connector used to access data from the content source. The valid values for the –Type parameter are web, SharePoint, custom, LotusNotes, File, Exchange, O12Business, and Business
(for the Business Connectivity Service) and are used to create the
appropriate content source object. In this way, the content source
object is similar to the schedule object described in the previous
section—there are a number of types of content source objects, just as
there are a number of types of schedule objects.
You cannot specify the crawl schedule for the new content
source with New-SPEnterprise-SearchCrawlContentSource; therefore, you
will need to configure it after the content source is created, using Set-SPEnterpriseSearchCrawlContentSource.
The three mandatory parameters for New-SPEnterpriseSearchCrawlContentSource are
The following example and sample output show how to create a content source for a file share.
New-SPEnterpriseSearchCrawlContentSource "Contoso File Share" `
-SearchApplication $searchapp -Type File `
-StartAddresses file://win08r2/foldershare1, file://win08r2/foldershare2;
Name Id Type CrawlState CrawlCompleted
---- -- ---- ---------- --------------
Contoso File ... 3 File Idle
The following example and sample output show how to create a content source for a website.
New-SPEnterpriseSearchCrawlContentSource "Old Intranet" `
-SearchApplication $searchapp -Type Web `
s-StartAddresses "http://oldintranet.contoso.msft/";
Name Id Type CrawlState CrawlCompleted
---- -- ---- ---------- --------------
Old Intranet 14 Web Idles
The following example and sample output removes the content source.
Remove-SPEnterpriseSearchCrawlContentSource "old intranet" `
-SearchApplication $searchapp;
Confirm
Are you sure you want to perform this action?
Performing operation "Remove-SPEnterpriseSearchCrawlContentSource" on Target
"Old Intranet".
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help
(default is "Y"):
2.3. Starting and Stopping Content Source Crawling
You can obtain information about the status of crawling of content
sources by using the properties CrawlState (alias for CrawlStatus),
CrawlCompleted, and CrawlStarted. You can initiate crawling of content
sources by using the methods StartFullCrawl and StartIncrementalCrawl.
If you initiate an incremental crawl before you have run a full crawl,
then the content source is fully crawled. There are also methods to
pause, resume, and stop crawling.
Using the content source
methods and properties, you can create several useful administrative
scripts. The following script starts an incremental crawl for each
content source that has a status of Idle that cycles through the
collection of content sources associated with the Contoso Search Service
Application.
$searchContoso = Get-SPEnterpriseSearchServiceApplication |
where {$_.name -like 'cont*'};
$csources = $searchContoso | Get-SPEnterpriseSearchCrawlContentSource;
foreach ($csource in $csources) {
if ($csource.CrawlStatus -eq "Idle") {
Write-Host "Starting Incremental crawl for content source:" $csource.Name;
$csource.StartIncrementalCrawl();
} else {
Write-Host "Incremental Crawl not started:" $csource.name " Status:" `
$csource.CrawlState;
}
}
In the next example, the script starts a full crawl for the content source associated with a specific Web application.
# Set variable to the save the name of the web application
$webapp = "http://intranet.contoso.msft"
# Get a list of Search Service Application proxies associated with the web application
$sp = (Get-SPWebApplication $webapp).ServiceApplicationProxyGroup.Proxies |
where { $_.typename -like 'search*'};
# Get a list of content sources associated with the Search Service Applications
# that are set to crawl the web application. Wildcard comparison is included to
# check for any fully qualified domain name or any subsites or site collections
# that are specifically included in a content source.
$csources = foreach ($searcha in $sp) {
Get-SPEnterpriseSearchServiceApplication |
where { $_.name -eq $searcha.GetSearchApplicationName()} |
Get-SPEnterpriseSearchCrawlContentSource |
where {$_.StartAddresses -like $webapp + "*"};
}
# Start a full crawl for any content source whose status equal to idle
foreach ($csource in $csources) {
$csource.StartFullCrawl() |
where {$cs.CrawlStatus -eq "Idle"};
}
If you find that you are
typing the same sets of commands again and again, you should enclose
those commands in a function or store them in a file with the extension
.ps1, known as a Windows
PowerShell script. This allows you to reuse the set of commands without
re-entering each command. For example, copy the set of commands shown
in the previous example below the comment line in the following script.
Because the function accepts the variable $webapp as its input parameter, delete the line that initializes the $webapp variable from the copied code.
Function Start-FullCrawlWebApp ([string]$webapp) { # Copy script below this line } # End of Start-FullCrawlWebApp Function
Removing all comment lines (to save space here), your function should look similar to the following function.
Function Start-FullCrawlWebApp ([string]$webapp) { # Copy script below this line $sp = (Get-SPWebApplication $webapp).ServiceApplicationProxyGroup.Proxies | where { $_.typename -like 'search*'}; $csources = foreach ($searcha in $sp) { Get-SPEnterpriseSearchServiceApplication | ` where { $_.name -eq $searcha.GetSearchApplicationName()} | ` Get-SPEnterpriseSearchCrawlContentSource | ` where {$_.StartAddresses -like $webapp + "*"}; } foreach ($csource in $csources) { $csource.StartFullCrawl() | where {$cs.CrawlStatus -eq "Idle"}; }
The type [string] placed before the input variable $webapp
ensures that the function receives the correct data. Prefixing an input
variable with a type constraint is optional and will generate an error
if the function is not used with the correct data.
To start a full crawl of the Web application http://www.contoso.msft, use the following command.
Start-FullCrawlWebApp -webapp http://www.contoso.msft |